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We consider the high energy physics unfolding problem where the 
goal is to estimate the spectrum of elementary particles given obser¬ 
vations distorted by the limited resolution of a particle detector. This 
important statistical inverse problem arising in data analysis at the 
Large Hadron Collider at CERN consists in estimating the intensity 
function of an indirectly observed Poisson point process. Unfolding 
typically proceeds in two steps: one first produces a regularized point 
estimate of the unknown intensity and then uses the variability of 
this estimator to form frequentist confidence intervals that quantify 
the uncertainty of the solution. In this paper, we propose forming the 
point estimate using empirical Bayes estimation which enables a data- 
driven choice of the regularization strength through marginal max¬ 
imum likelihood estimation. Observing that neither Bayesian credi¬ 
ble intervals nor standard bootstrap confidence intervals succeed in 
achieving good frequentist coverage in this problem due to the inher¬ 
ent bias of the regularized point estimate, we introduce an iteratively 
bias-corrected bootstrap technique for constructing improved confi¬ 
dence intervals. We show using simulations that this enables us to 
achieve nearly nominal frequentist coverage with only a modest in¬ 
crease in interval length. The proposed methodology is applied to 
unfolding the Z boson invariant mass spectrum as measured in the 
CMS experiment at the Large Hadron Collider. 

1. Introduction. This paper studies a generalized linear inverse problem 
[Bochkina (2013)], called the unfolding problem [Prosper and Lyons (2011), 
Cowan (1998), Blobel (2013)], arising in data analysis at the Large Hadron 
Collider (LHC) at CERN, the European Organization for Nuclear Research. 
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The LHC is the world’s largest and most powerful particle accelerator. It 
collides two beams of protons in order to study the properties and inter¬ 
actions of elementary particles produced in such collisions. The trajectories 
and energies of these particles are recorded using gigantic underground par¬ 
ticle detectors and the vast amounts of data produced by these experiments 
are analyzed in order to draw conclusions about fundamental laws of physics. 
Due to their complex structure and huge quantity, the analysis of these data 
poses significant statistical and computational challenges. 

Experimental high energy physicists use the term “unfolding” to refer to 
correcting the distributions measured at the LHC for the limited resolu¬ 
tion of the particle detectors. Let X be some physical quantity of interest 
measured in the detector. This could, for example, be the energy, mass or 
production angle of a particle. Due to the noise induced by the detector, we 
are only able to observe a stochastically smeared or folded version Y of this 
quantity. As a result, the observed distribution of Y is a “blurred” version 
of the true, physical distribution of X and the task is to use the observed 
values of Y to estimate the distribution of X. Each year, the experimental 
collaborations working with LHC data produce dozens of physics results that 
make use of unfolding. Recent examples include studies of the characteristics 
of jets [Chatrchyan et al. (2012a)], the transverse momentum distribution 
of W bosons [Aad et al. (2012)] and charge asymmetry in top-quark pair 
production [Chatrchyan et al. (2012b)], to name a few. 

The main challenge in unfolding is the ill-posedness of the problem in the 
sense that a simple inversion of the forward mapping from the true space into 
the smeared space is unstable with respect to small perturbations of the data 
[Engl, Hanke and Neubauer (2000), Kaipio and Somersalo (2005), Panaretos 
(2011)]. As such, the trivial maximum likelihood solution of the problem 
often exhibits spurious high-frequency oscillations. These oscillations can be 
tamed by regularizing the problem, which is done by taking advantage of 
additional a priori knowledge about plausible solutions. 

An additional complication is the non-Gaussianity of the data which fol¬ 
lows from the fact that both the true and the smeared observations are 
realizations of two interrelated Poisson point processes, which we denote 
by M and N, respectively. As such, unfolding is an example of a Poisson 
inverse problem [Antoniadis and Bigot (2006), Reiss (1993)], where the in¬ 
tensity function / of the true process M is related to the intensity function 
g of the smeared process N via a Fredholm integral operator K, that is, 
g = Kf , where K represent the response of the detector. The task at hand 
is then to estimate and make inferences about the true intensity / given a 
single observation of the smeared process N. Due to the Poisson nature of 
the data, many standard techniques based on a Gaussian likelihood function, 
such as Tikhonov regularization, are only approximately valid for unfolding. 
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Furthermore, estimators properly taking into account the Poisson distribu¬ 
tion of the observations are rarely available in a closed form, making the 
problem computationally challenging. 

At present, the unfolding methodology used in LHC data analysis is not 
well established [Lyons (2011)]. The two main approaches are the expectation- 
maximization (EM) algorithm with an early stopping [D’Agostini (1995), 
Vardi, Shepp and Kaufman (1985), Lucy (1974), Richardson (1972)] and a 
certain variant of Tikhonov regularization [Hocker and Kartvelishvili (1996)]. 
In high energy physics (HEP) terminology, the former is called the D ’Agostini 
iteration and the latter, somewhat misleadingly, SVD unfolding (with SVD 
referring to the singular value decomposition). In addition, a HEP-specific 
heuristic, called bin-by-bin unfolding, which provably accounts for smearing 
effects incorrectly through a multiplicative efficiency correction, has been 
widely used. Recently, Choudalakis (2012) proposed a Bayesian solution to 
the problem, but this seems to have seldom been used in practice thus far. 

The main problem with the D’Agostini iteration is that it is difficult to 
give a physical interpretation to the regularization imposed by early stop¬ 
ping of the iteration. SVD unfolding, on the other hand, ignores the Poisson 
nature of the observations and does not enforce the positivity of the solution. 
Furthermore, both of these methods suffer from not dealing with two signif¬ 
icant issues satisfactorily: (1) the choice of the regularization strength and 
(2) quantification of the uncertainty of the solution. The delicate problem of 
choosing the regularization strength is handled in most LHC analyses using 
nonstandard heuristics or, in the worst-case scenario, by simply fixing some 
value “by hand.” When quantifying the uncertainty of the unfolded spec¬ 
trum, the analysts form approximate frequentist confidence intervals using 
simple error propagation, but little is known about the coverage properties 
of these intervals. 

In this paper, we propose new statistical methodology aimed at addressing 
the two above-mentioned issues in a more satisfactory manner. Our main 
methodological contributions are as follows: 

1. Empirical Bayes selection of the regularization parameter using a Monte 
Carlo expectation-maximization algorithm [Casella (2001), Geman and Mc¬ 
Clure (1985, 1987), Saquib, Bournan and Sauer (1998)]; 

2. Frequentist uncertainty quantification using a combination of an it¬ 
erative bias-correction procedure [Kuk (1995), Goldstein (1996)] and boot¬ 
strap percentile intervals [Efron and Tibshirani (1993), Davison and Hinkley 
(1997)]. 

To the best of our knowledge, neither of these techniques has been previously 
used to solve the HEP unfolding problem. Our framework also properly 
takes into account the Poisson distribution of the observations, enforces 
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the positivity constraint of the unfolded spectrum and imposes a curvature 
penalty on the solution with a clear physical interpretation. 

It is helpful to think of the unfolding problem as consisting of two separate, 
but related, subproblems: one of point estimation and the other of uncer¬ 
tainty quantification. We follow the standard practice of first constructing 
a point estimate of the unknown intensity and then using the variability of 
this point estimate to form frequentist confidence intervals. For the point 
estimation part, the main challenge is to decide how to regularize the ill- 
posed problem and, in particular, how to choose the regularization strength. 
Classical, well-known techniques for making this choice include the Morozov 
discrepancy principle [Morozov (1966)] and cross-validation [Stone (1974)]. 
Bardsley and Goldes (2009) study these techniques in the context of Poisson 
inverse problems, while Veklerov and Llacer (1987) provide an alternative ap¬ 
proach based on statistical hypothesis testing. From a Bayesian perspective, 
the problem can be addressed using a Bayesian hierarchical model [Kaipio 
and Somersalo (2005)] which necessitates the choice of a hyperprior for the 
regularization parameter. On the other hand, empirical Bayes selection of 
the regularization parameter using the marginal likelihood, which has the 
key advantage of not requiring the specification of a hyperprior, has received 
relatively less attention in the inverse problems literature. However, in many 
other fields of statistical inference, such as semiparametric regression [Wood 
(2011), Ruppert, Wand and Carroll (2003), Section 5.2], neural networks 
[Bishop (2006), Sections 3.5 and 5.7.2] and Gaussian processes [Rasmussen 
and Williams (2006), Chapter 5], the use of empirical Bayes techniques has 
become part of standard practice. The approach we follow bears similarities 
to that of Saquib, Bournan and Sauer (1998), where the marginal maxi¬ 
mum likelihood estimator is used to select the regularization parameter in 
tomographic image reconstruction with Poisson data. 

Once we have formed an empirical Bayes point estimate of the unknown 
intensity function, we would like to use the variability of this point estimator 
to quantify the uncertainty of the solution. In high energy physics, frequen¬ 
tist confidence statements are generally preferred over Bayesian alternatives 
[Lyons (2013)] and we would hence like to obtain confidence intervals with 
good frequentist coverage properties. To achieve this, the main challenge 
comes from the bias that is present in the point estimate in order to regu¬ 
larize the otherwise ill-posed problem. We show using simulations that this 
bias can seriously degrade the coverage probability of both Bayesian credible 
intervals and standard bootstrap confidence intervals. We propose to rem¬ 
edy this problem by employing an iterative bias-correction technique [Kuk 
(1995), Goldstein (1996)] and then using the variability of the bias-corrected 
point estimate to form the confidence intervals. Quite remarkably, our sim¬ 
ulation results indicate that such a technique achieves close-to-nominal cov¬ 
erage with only a modest increase in the length of the intervals. 
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The paper is structured as follows. Section 2 gives a brief overview of the 
LHC detectors and explains the role that unfolding plays in these experi¬ 
ments. We then formulate in Section 3 a forward model for the unfolding 
problem using Poisson point processes. The proposed statistical methodol¬ 
ogy is explained in detail in Section 4 which forms the backbone of this 
paper. This is followed by simulation studies in Section 5 and a real-world 
data analysis scenario in Section 6 which consists of unfolding of the Z boson 
invariant mass spectrum measured in the CMS experiment at the LHC. We 
close the paper with some concluding remarks in Section 7. We also invite 
the reader to consult the online supplement [Kuusela and Panaretos (2015)] 
which provides further simulation results and some technical details. 

2. LHC experiments and unfolding. 

2.1. Overview of LHC experiments. The Large Hadron Collider is a 
27 km long circular proton-proton collider located in an underground tunnel 
at CERN in Geneva, Switzerland. With proton-proton collisions of up to 
13 TeV 3 center-of-mass energy, the LHC is the world’s most powerful parti¬ 
cle accelerator. The protons are accelerated in bunches of billions of particles 
and bunches moving in opposite directions are led to collide at the center 
of four gigantic particle detectors called ALICE, ATLAS, CMS and LHCb. 
In the LHC Run 1 configuration, these bunches collided every 50 ns at the 
heart of the detectors, resulting in some 20 million collisions per second 
in each detector, out of which the few hundred most interesting ones were 
stored for further analysis. In LHC Run 2, which started in June 2015, the 
collision rate is likely to be even higher. 

Out of the four detectors, ATLAS and CMS are multipurpose experiments 
capable of performing a large variety of physics analyses ranging from the 
discovery of the Higgs boson to precision studies of quantum chromody¬ 
namics. The other two detectors, ALICE and LHCb, specialize in studies of 
lead-ion collisions and 5-hadrons, respectively. In what follows, we focus on 
describing the data collection and analysis in the CMS experiment, which is 
also the source of the data of our unfolding demonstration in Section 6, but 
similar principles also apply to ATLAS and, to some extent, to other high 
energy physics experiments. 

The CMS experiment [Chatrchyan et al. (2008)], an acronym for Com¬ 
pact Muon Solenoid, is situated in an underground cavern along the LHC 
ring near the village of Cessy, France. The detector, weighing a total of 
12,500 tons, has a cylindrical shape with a diameter of 14.6 m and a length 


3 The electron volt, eV, is the customary unit of energy used in particle physics, 1 eV ~ 
1.6- 1(T 19 J. 



6 


M. KUUSELA AND V. M. PANARETOS 


of 21.6 m. The construction, operation and data analysis of the experiment 
is conducted by an international collaboration of over 4000 scientists, en¬ 
gineers and technicians. When two protons collide at the center of CMS, 
their energy is transformed into matter in the form of new particles. A small 
fraction of these particles are exotic, short-lived particles, such as the Higgs 
boson or the top quark, which are at the center of the scientific interest of 
the high energy physics community. Such particles decay almost instantly 
into more familiar, stable particles, such as electrons, muons or photons. 
Using various subdetectors, the energies and trajectories of these particles 
are recorded in order to study the properties and interactions of the exotic 
particles created in the collision. 

The layout of the CMS detector is illustrated in Figure 1. The detec¬ 
tor is immersed in a 3.8 T magnetic field created using a superconducting 
solenoid magnet. This magnetic field bends the trajectory of any charged 
particle traversing the detector. This enables the measurement of the par¬ 
ticle’s momentum, since the higher the momentum, the less the particle’s 
trajectory is bent. 

CMS consists of three layers of subdetectors: the tracker, the calorimeters 
and the muon detectors. The innermost detector is the silicon tracker, which 
consists of an inner layer of pixel detectors and an outer layer of microstrip 
detectors. When a charged particle passes through these semiconducting 
detectors, it leaves behind electron-hole pairs, and hence creates an electric 
signal. These signals are combined into a particle track using a Kalman filter 
in order to reconstruct the trajectory of the particle. 


Key: 


- Muon 

- Electron 

- Charged Hadron (e.g.Pion) 

- Neutral Hadron (e.g. Neutron) 
■ Photon 



Transverse slice 
through CMS 


Fig. 1. Illustration of the detection of particles at the CMS experiment [Barney (2004)]. 
Each type of particle leaves its characteristic trace in the various subdetectors of the exper¬ 
iment. This enables identification of different particles as well as the measurement of their 
energies and trajectories. Copyright: CERN, for the benefit of the CMS Collaboration. 
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The next layer of detectors are the calorimeters, which are devices for mea¬ 
suring the energies of particles. The CMS calorimeter system is divided into 
an electromagnetic calorimeter (ECAL) and a hadron calorimeter (HCAL). 
Both of these devices are based on the same general principle: they are 
made of extremely dense materials with the aim of stopping the particles 
passing through. In the process, a portion of the energy of these particles 
is converted into light in a scintillating material and the amount of light, 
which depends on the energy of the incoming particle, is measured using 
photodetectors inside the calorimeters. The ECAL measures the energy of 
particles that interact mostly via the electromagnetic interaction, in other 
words, electrons, positrons and photons. The HCAL, on the other hand, 
measures the energies of hadrons, that is, particles composed of quarks. 
These include, for example, protons, neutrons and pions. The HCAL is also 
instrumental in measuring the energies of jets, that is, collimated streams 
of hadrons produced by quarks and gluons, and in detecting the so-called 
missing transverse energy, an energy imbalance caused by noninteracting 
particles, such as neutrinos, escaping the detector. 

The outermost layer of CMS consists of muon detectors, whose task is to 
identify muons and measure their momenta. Accurate detection of muons 
was of central importance in the design of CMS since muons provide a clean 
signature for many exciting physics processes. This is because there is a 
very low probability for other particles, with the exception of noninteracting 
neutrinos, to penetrate through the CMS calorimeter system. For example, 
the four-muon decay channel played an important role in the discovery of 
the Higgs boson at CMS [Chatrchyan et al. (2012c)]. 

The information of all CMS subdetectors is combined [Chatrchyan et al. 
(2009)] to identify the stable particles, that is, muons, electrons, positrons, 
photons and various types of hadrons, produced in each collision event; see 
Figure 1. For example, a muon will leave a track in both the silicon tracker 
and the muon chamber, while a photon produces a signal in the ECAL with¬ 
out an associated track in the tracker. The information of these individual 
particles is then used to reconstruct higher-level physics objects, such as jets 
or missing transverse energy. 

2.2. Unfolding in LHC data analysis. The need for unfolding arises be¬ 
cause any quantity measured at the LHC detectors is corrupted by stochastic 
noise. For example, let £ be the true energy of an electron hitting the CMS 
ECAL. Then the observed value of the energy follows to a good approxi¬ 
mation the Gaussian distribution J\f(£, cr 2 (£)), where the variance satisfies 
[Chatrchyan et al. (2008)] 
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where S, N and C are fixed constants. The measurement noise is not al¬ 
ways additive. Furthermore, for more sophisticated measurements, such as 
the ones combining information from several subdetectors or more than one 
particle, the distribution of the response is usually not available in a closed 
form. Indeed, most analyses rely on detector simulations or auxiliary mea¬ 
surements to determine the detector response. 

It should be pointed out that not all LHC physics analyses directly rely 
on unfolding. The common factor between the examples given in Section 1 
is that these are measurement analyses and not discovery analyses, mean¬ 
ing that these are analyses studying in detail the properties of some already 
known phenomenon. In such a case, the experimental interest often lies in the 
detailed particle-level shape of some distribution which can be obtained us¬ 
ing unfolding, while discovery analyses are almost exclusively carried out in 
the smeared space. Unfolding nevertheless plays an indirect role in attempts 
to discover new physics at the LHC. Namely, discovery analyses often use 
unfolded results as inputs to their analysis chain. 

The need to unfold the measurements usually arises for the purposes of 
the following: 

• Comparison of experiments with different responses: The only direct way 
of comparing the spectra measured in two different experiments is to com¬ 
pare the unfolded measurements. 

• Input to a subsequent analysis: Certain tasks, such as the estimation of 
parton distributions functions or the fine-tuning of Monte Carlo event 
generators, typically require unfolded input spectra. 

• Comparison with future theories: When unfolded spectra are published, 
theorists can directly use them to compare with any new theoretical pre¬ 
dictions which might not have existed at the time of the original mea¬ 
surement. This justification is sometimes considered controversial since, 
alternatively, one could publish the response of the detector and the the¬ 
orists could use it to smear their new predictions. 

• Exploratory data analysis: The unfolded spectrum could reveal hidden 
features and structure in the data which are not considered in any of the 
existing theoretical predictions. 

According to the CERN Document Server (https : //cds . cern. ch/), the 
CMS experiment published in 2012 a total of 103 papers out of which 16 
made direct use of unfolding and many more indirectly relied on unfolded 
results. That year, unfolding was most often used in studies of quantum 
chromodynamics (4 papers), forward physics (4) and properties of the top 
quark (3). Most of these results relied on the questionable bin-by-bin heuris¬ 
tic (8), while the EM algorithm (3) and various forms of penalization (6) 
were also used. We expect similar statistics to also hold for the other LHC 
experiments. 
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3. Problem formulation. In most situations in high energy physics, the 
data generation mechanism can be modeled as a Poisson point process [see, 
e.g., Reiss (1993)]. Let £ be a compact interval on 1, / a nonnegative 
function in L 2 (E ) and M a discrete random measure on E. Then M is 
a Poisson point process on state space E with intensity function / if and 
only if: 

1. M(B ) ~ Poisson(A(£>)) with A (B) = j R f(s ) ds for every Borel set Bc£; 

2. M(B i),... ,M(B n ) are independent for pairwise disjoint Borel sets Bi C 

E, i = 1,... ,n. 

In other words, the number of points M(B) observed in the set B C E 
is Poisson distributed with mean f B f(s)ds and the number of points in 
disjoint sets are independent random variables. 

For the problem at hand, the Poisson process M represents the true, 
particle-level spectrum of events. The smeared, detector-level spectrum is 
represented by another Poisson process N. The process N is assumed to 
have a state space F, which is a compact interval on M, and a nonnegative 
intensity function g £ L 2 ( F ). The intensities of the two processes are related 
by a bounded linear operator K : L 2 (E) —>• L 2 (F) so that g = Kf . In what 
follows, we assume K to be a Fredholm integral operator, that is, 

(2) g (t) = (Kf)(t) = [ k(t,s)f(s)ds, 

J E 

where the kernel k € L 2 (F x E). For the purposes of this paper, we assume 
that k is known, although in reality there is usually an uncertainty associated 
with it; see Section 7. The unfolding problem is then to estimate the true 
intensity / given a single observation of the smeared Poisson process N. 

This Poisson inverse problem [Antoniadis and Bigot (2006), Reiss (1993)] 
is ill-posed in the sense that in virtually all practical cases the pseudoinverse 
iCi of the forward operator K is an unbounded, and hence discontinuous, 
linear operator [Engl, Hanke and Neubauer (2000)]. This means that the 
naive approach of first estimating g using, for example, a kernel density 
estimate g and then estimating / using / = K^g is unstable with respect to 
fluctuations of g. 

To better understand the physical meaning of the kernel k, let us con¬ 
sider the unfolding problem at the point level. Denoting by X t the true 
observables, the Poisson point process M can be written as M = ^T =1 <5 Yi , 
where 5xi is the Dirac measure at Xi € E and r, X\,X 2 , ■ ■ ■ are independent 
random variables such that r ~ Poisson(A(F!)) and the Xi are identically 
distributed with probability density f(-)/X(E), where A (E) = f E f(s)ds. 

When the particle corresponding to X t traverses the detector, the first 
thing that can happen is that it might not be observed at all due to the 
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limited efficiency and acceptance of the device. Mathematically, this corre¬ 
sponds to thinning of the Poisson process. Let Z, € {0,1} be an indicator 
variable showing whether the point Xj is observed (Zj = 1) or not (Z,, = 0). 
We assume that r, (Xi, Z\), (X 2 , Z 2 ),... are independent and that the pairs 
(Xj,Zj) are identically distributed. Then the thinned true process is given 

by M* = Y7i=\ z if>Xi = Ya= 1 fix* 1 where £ = Yli=i z i and the X* are the 
true points with Z,. = 1. The thinned process M* is a Poisson point process 
with intensity function f*(s) = e(s)/(s), where e(s) = P(Zj = l|Xj = s) is 
the efficiency of the detector for a true observation at s € E. 

For each observed point X* € E, the detector measures a noisy value Y] € 
F. We assume that the smeared observations Yi are i.i.d. with probability 
density 

(3) p{Yi =t)= f p(Yi = t\X* = s)p(X* = s ) ds. 

J E 

From this, it follows that the smeared observations Yi constitute a Poisson 
point process N = <5y; whose intensity function g is given by 

(4) g(t) = [ p{Yi = t\X* = s)e(s)f(s) ds. 

JE 

We hence identify that the kernel k in equation (2) is given by 

(5) k(t,s) =p{Yi = t\X* = s)e(s). 

Notice that in the special case where k(t, s) = k(t — s), unfolding becomes a 
deconvolution problem [Meister (2009)] for Poisson point process observa¬ 
tions. 

4. Unfolding methodology. 

4.1. Outline of the proposed methodology. In this section we propose a 
novel combination of statistical methods for solving the high energy physics 
unfolding problem formalized in Section 3. The proposed methodology is 
based on the following key ingredients: 

1. Discretization: 

(a) The smeared observations are discretized using a histogram. 

(b) The unknown particle-level intensity is modeled using a B-spline, 

that is, f(s) = 1 fijBjis), s € E, where Bj(s),j = 1,... ,p, are the 

B-spline basis functions. 

2. Point estimation: 

(a) Posterior mean estimation of the unknown basis coefficients (3 = 
[/?!,... ,/3 p ] T using a single-component Metropolis-Hastings sampler. 
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(b) Empirical Bayes estimation of the scale 5 of the regularizing smooth¬ 
ness prior p((3\S) using a Monte Carlo EM algorithm. 

3. Uncertainty quantification: 

(a) Iterative bias-correction of the point estimate (3. 

(b) Use of bootstrap percentile intervals of the bias-corrected intensity 
function to form pointwise frequentist confidence bands for /. 

This methodology enables a principled solution of the unfolding problem, 
including the choice of the regularization strength and frequentist uncer¬ 
tainty quantification. We explain below each of these steps in detail and 
argue why this particular choice of techniques provides a natural framework 
for solving the problem. 


4.2. Discretization of the problem. In applied situations, Poisson inverse 
problems are almost exclusively studied in a form where both the observable 
process N and the unobservable process M are discretized. Usually the 
first step is to discretize the observable process using a histogram. In most 
applications, this has to be done due to the discrete nature of the detector. In 
our case, the observations are, at least in principle, continuous, but we still 
carry out the discretization due to computational reasons. Indeed, in many 
analyses, there can be millions of observed collision events and treating each 
of these individually would not be computationally feasible. 

In order to discretize the smeared process N, let {Ej}” =1 be a partition of 
the smeared space F into n ordered intervals and let yi denote the number 
of points falling on interval Fi, that is, y, = N{Ff),i = l,...,n. This can 
be seen as recording the observed points in a histogram with bin contents 
y = [yi,..., y n ] T and is indeed the form of discretization most often employed 
in HEP. This discretization is convenient since it now follows from N being 
a Poisson process that the yi are independent and Poisson distributed with 
means 


(6) m= [ g(t)dt= [ [ k(t,s)f(s)dsdt, i = l,...,n. 

J Fi J Fi JE 

In the true space E, there is no need to settle only for histograms. 
Instead, we consider a basis expansion of the true intensity /, that is, 
f(s ) = s € E, where is a sufficiently large dictionary 

of basis functions. Substituting the basis expansion into equation (6), we 
find that the means pi are given by 



where we have denoted 


( 8 ) 


Ki,j = 



Fi JE 


k(t, s)f>j(s) ds dt, 


i = 1,... ,n,j = 1,... ,p. 
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Consequently, unfolding reduces to estimating (3 in the Poisson regression 
problem 

(9) y|/3 ~ Poisson(K/3) 

for an ill-conditioned matrix K = (K t] ). 

Since spectra in high energy physics are typically smooth functions, splines 
[de Boor (2001), Schumaker (2007), Wahba (1990)] provide a particularly 
attractive way of representing the unknown intensity /. Let min E = so < 
si < S 2 < ■ ■ ■ < sl < sl+i = ma xE be a sequence of knots in the true space 
E. Then an order-m spline with knots Sj, i = 0,..., L + 1, is a piecewise poly¬ 
nomial whose restriction to each interval [s,, Sj + i), / = 0,..., L, is an order-m 
polynomial (i.e., a polynomial of degree m — 1) and which has m — 2 con¬ 
tinuous derivatives at each interior knot Si,i = 1,... ,L. An order-m spline 
with L interior knots has p = L + m degrees of freedom. In this work, we use 
exclusively order-4 cubic splines which consist of third degree polynomials 
and are twice continuously differentiable. Note also that an order-1 spline 
yields a histogram representation of /. 

There exist various bases {(j)j}^ =1 for expressing splines of arbitrary order. 
We use B-splines Bj, j = 1,... ,p, that is, spline basis functions of minimal 
local support, because of their good numerical properties and conceptual 
simplicity. O’Sullivan (1986, 1988) was among the first authors to use reg¬ 
ularized B-spline estimators in statistical applications, with the approach 
later popularized by Eilers and Marx (1996). In the HEP unfolding litera¬ 
ture, penalized maximum likelihood estimation with B-splines goes back to 
the work of Blobel (1985) and recent contributions using similar method¬ 
ology include Dembinski and Roth (2011) and Milke et al. (2013). We use 
the Matlab Curve Fitting Toolbox to efficiently evaluate and perform ba¬ 
sic operations on B-splines. These algorithms rely on the recursive use of 
lower-order B-spline basis functions; for details, see de Boor (2001). 

The nonnegativity of the intensity function / is enforced by constraining 
/3 to be in = {x € R p : X* > 0, / = 1,... ,p}. Since each of the B-spline 
basis functions Bj,j = l,...,p, is nonnegative, this is a sufficient condition 
for the nonnegativity of /. It should be noted, however, that generally this 
is not a necessary condition for the nonnegativity of / (except for order-1 
and order-2 B-splines). That is, when imposing (3 £ R+, we are restricting 
ourselves to a proper subset of the set of positive splines which may incur 
a slight, but not restrictive, reduction in the versatility of the family of 
functions available to us [de Boor and Daniel (1974)]. 

4.3. Point estimation. 

4.3.1. Posterior mean estimation of the spline coefficients. In contrast 
to most work on unfolding, we take a Bayesian approach to estimation of 
the spline coefficients (3. That is, we estimate (3 using the Bayesian poste- 
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rior 


(10) p(f3\y,6) = 


p{y\(3)p((3\8) 


p(y\(3)p(f3\5) 


/9G 


p(y|£) p(y\P')p(P'\$)df3 n 

where the likelihood is given by the Poisson regression model (9), 


M-J 


( 11 ) 


p(y\P) = l 


= 1 f (gj=i K iM y ' c - yy; . 


i= 1 


yp 


/3e: 


The prior p(/3|<5), which regularizes the otherwise ill-posed problem, depends 
on a hyperparameter 8, which controls the concentration of the prior and 
is analogous to the regularization parameter in the classical methods for 
solving inverse problems. 

We decided to use the Bayesian approach for two reasons. First, it provides 
a natural interpretation for the regularization via the prior density p(f3\8), 
which should be chosen in such a way that most of its probability mass 
lies in physically plausible regions of the parameter space Second, the 
Bayesian framework enables a straightforward, data-driven way of choosing 
the regularization strength 8 using empirical Bayes estimation as explained 
below in Section 4.3.2. 

In order to regularize the problem, let /3 £ R+, and 8 > 0, and consider 
the truncated Gaussian smoothness prior 

(12) p((3\8) (xexp(-8\\f"\\l) = exp(-8 [ {/"(s)} 2 ds) = exp(-5/3 T 12/3), 

JE 

where the elements of the p x p matrix 12 are given by Ojj = f E B”(s)B" (s) ds 
The interpretation of this prior is that the total curvature of /, characterized 
by 1111 2 j should be small. In other words, / should be a relatively smooth 
function. The strength of the regularization is controlled by the hyperpa¬ 
rameter 8 —the larger the value of 8, the smoother / is required to be. 

The prior as defined by equation (12) does not enforce any boundary con¬ 
ditions for the unknown intensity /. As a result, the matrix fl has rank p — 2, 
and hence the prior is potentially improper (this depends on the orientation 
of the null space of 12). Although the posterior would still be a proper prob¬ 
ability density, the rank deficiency of 12 is undesirable since the empirical 
Bayes approach requires a proper prior distribution. Furthermore, without 
any boundary constraints, the unfolded intensity has an unnecessarily large 
variance near the boundaries. 

To address these issues, we use Aristotelian boundary conditions [Calvetti, 
Kaipio and Someralo (2006)], where the idea is to condition the smooth¬ 
ness penalty on the boundary values /(so) and /{sl+ i) and then intro¬ 
duce additional hyperpriors for these values. Since /(sq) = /3i-Bi(so) and 
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f{sL+ 1 ) = f3pB p (sL- f-i), we can equivalently condition on (/3i,/3 p ). As a re¬ 
sult, the prior model becomes 

(13) p{/3\6) =p(p 2 ,---,P P -i\l3i,Pp,5)p(f3i\8)p(P p \5), (3 gR p + , 

where p(/32, ■ ■ ■ ,/3 p _i|/3i,/3 p ,<5) oc exp(— 5/3 1 f2/3). We model the boundaries 
using once again truncated Gaussians: 

(14) p(Pi\5)ocexp(-5'Y L Pi), Pi> 0, 

(15) p(/3 p |<5) ocexp(-<57 R /3p), /3 P > 0, 

where 7 Lj7r > 0 are hxed constants. The full prior can then be written as 

(16) p(/3|<5) ocexp(—5/3 T r2 A /3), /3 gI+, 

where the elements of the p x p matrix Oa are given by 

+ 7L, if i = J = 1, 

Mi,j+ 7R, if * = J = P, 

Gjj, otherwise. 

The augmented matrix J4 a is positive definite, and hence equation (16) 
defines a proper probability density. 

Once the hyperparameter 6 has been estimated using empirical Bayes 
(see Section 4.3.2), we plug its estimate 6 into Bayes’ rule (10) to obtain the 
empirical Bayes posterior p(/3\y,5). We then use the mean of this posterior 
as a point estimator (3 of the spline coefficients (3, that is, (3 = E(/3|y, 5), 
yielding the estimator f(s ) = Yl P j=i PjBj{ s ) °f the unknown intensity /. 

Of course, in practice, the posterior p(/3 |y,6) is not available in a closed 
form because of the intractable integral in the denominator of Bayes’ rule (10) 
Hence, we need to resort to Markov chain Monte Carlo (MCMC) [Robert and 
Casella (2004)] sampling from the posterior and the posterior mean is then 
computed as the empirical mean of the Monte Carlo sample. Unfortunately, 
the most elementary MCMC samplers are not well-suited for solving the 
unfolding problem: Gibbs sampling is not computationally tractable since 
the full posterior conditionals do not belong to any of the standard fami¬ 
lies of probability distributions and the Metropolis-Hastings sampler with 
multivariate proposals is difficult to implement since the posterior can have 
very different scales for different components of (3. 

To be able to efficiently sample from the posterior, we adopt the single¬ 
component Metropolis-Hastings sampler (also known as the Metropolis- 
within-Gibbs sampler) proposed by Saquib, Bournan and Sauer (1998). De¬ 
noting (3_ k = /3fc_i, /3fc+i,..., /3 P } T , the basic idea of the sampler is 

to approximate the full posterior conditionals p(/3fc|/3_ fc ,y, 5) of the Gibbs 
sampler using a more tractable density [Gilks, Richardson and Spiegelhalter 
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(1996), Gilks (1996)]. One then samples from this approximate full condi¬ 
tional and performs a Metropolis-Hastings acceptance step to correct for the 
approximation error. In our case, we take a second-order Taylor expansion 
of the log full conditional, resulting in a Gaussian approximation of this 
density. When the mean of the Gaussian is nonnegative, we sample from 
its truncation to the nonnegative real line, and if the mean is negative, we 
replace the Gaussian tail by an exponential distribution. Further details on 
the MCMC sampler can be found in Section III.C of Saquib, Bouman and 
Sauer (1998), while the online supplement [Kuusela and Panaretos (2015)] 
provides details on the convergence and mixing checks that were performed 
for the sampler. 

4.3.2. Empirical Bayes selection of the regularization strength. The 
Bayesian approach to solving inverse problems is particularly attractive since 
it admits selection of the regularization strength 5 using marginal maximum 
likelihood estimation. For a comprehensive introduction to this and related 
empirical Bayes methods, see, for example, Chapter 5 of Carlin and Louis 
(2009). The main idea in empirical Bayes is to regard the marginal distribu¬ 
tion p(y|<5) appearing in the denominator of Bayes’ rule (10) as a parametric 
model for the data y and then use standard frequentist point estimation 
techniques to estimate the hyperparameter 5. 

The marginal maximum likelihood estimator (MMLE) of the hyperparam- 
eter 5 is defined as the maximizer of p(y|<5) with respect to 5. That is, we 
estimate <5 using 

(18) 5 = argmaxp(y|d) = argmax / p(y\(3)p(/3\5) d/3. 

6>o <5>o J r* 

Computing the MMLE is nontrivial since we cannot evaluate the high¬ 
dimensional integral in (18) either in a closed form or using standard nu¬ 
merical integration methods. Monte Carlo integration, where one samples 
{/3^}f =1 from the prior p(P\8) and then approximates 

1 5 

(19) p{y\8) ~-£^2p(y\(3 is) ), p{P\8), 

* S=1 

is also out of question. This is because, in the high-dimensional parameter 
space, most of the /3^’s fall on regions where the likelihood p(y|/3^) is 
numerically zero. Hence, we would need an enormous sample size S to get 
even a rough idea of the marginal likelihood p(y|<5). 

Luckily, it is possible to circumvent these issues by using the expectation- 
maximization (EM) algorithm [Dempster, Laird and Rubin (1977), McLach- 
lan and Krishnan (2008)] to find the MMLE. In the context of Poisson inverse 
problems, this approach was originally proposed by Geman and McClure 
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(1985, 1987) for tomographic image reconstruction and later studied and 
extended by Saquib, Bouman and Sauer (1998), but has received little at¬ 
tention since then. When applied to the unfolding problem, the standard 
EM prescription reads as follows. Let (y,/3) be the complete data, in which 
case the complete-data log-likelihood is given by 

(20) 1(6; y, (3) = logp(y, (3\6) = logp(y|/3) + logp(/3|<5), 

where we have used p(y,(3\5) = p(y\/3)p(/3\5). In the E-step of the algorithm, 
one computes the expectation of the complete-data log-likelihood over the 
unknown spline coefficients [3 conditional on the observations y and the 
current hyperparameter 5^: 

(21) Q(6; tfW) = E(l(8; y, /3)|y, <5«) = E(logp(y, /3|«5)|y, <5«) 

(22) =E(logp(/3|<5)|y,5 w ) +const, 

where the constant does not depend on 5. In the subsequent M-step, one 
maximizes the expected complete-data log-likelihood Q(6;6®) with respect 
to the hyperparameter 5. This maximizer is then used as the hyperparameter 
estimate on the next step of the algorithm: 

(23) = argmaxQ(<5;<!>W) = argmaxE(logp(/3|<5)|y,<^). 

<5>0 <5>0 

By Theorem 1 of Dempster, Laird and Rubin (1977), each step of this itera¬ 
tion is guaranteed to increase the incomplete-data likelihood p(y|<5), that is, 

p(y|<5 ( * +1) ) > p(y\fi^),t = 0,1,2,_With this construction, the incomplete- 

data likelihood conveniently coincides with the marginal likelihood and, 
hence, the EM algorithm enables us find the MMLE 5 of the hyperparame¬ 
ter 5. 

The expectation in equation (23), 

(24) E(logp(/3|<5)|y,<5 w ) = [ p((3\y, <5 W ) logp((3\6) d/3, 

Jr p + 

again involves an intractable integral, but this time can be computed using 
Monte Carlo integration. We simply need to sample {/3^}f =1 from the 
posterior p((3\y,6^) and then replace the expectation by its Monte Carlo 
approximation: 

1 S 

(25) E(logp(/3|<5)|y,d w )«-^logp(/3 (s) |<5), /3 (s) ~p(/3|y,<5 w ). 

S= 1 

This Monte Carlo approximation is better behaved than that of equation (19) 
due to the appearance of the logarithm and due to the fact that the sam¬ 
pling is from the posterior instead of the prior. The posterior sample can 
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be obtained using the single-component Metropolis-Hastings sampler de¬ 
scribed in Section 4.3.1. The resulting variant of the EM algorithm is called 
a Monte Carlo expectation-maximization (MCEM) algorithm [Wei and Tan¬ 
ner (1990)]. 

To summarize, the MCEM algorithm for finding the MMLE of the hyper¬ 
parameter 5 iterates between the following two steps: 

E-step : Sample /3 ^,..., /3^ from the posterior p(/3 |y, ) and compute 

1 5 

(26) Q(5;5^) = -Y,logp((3^\S). 

S= 1 

M-step : Set = argmax 5>0 Q(<5; <jW). 

This algorithm has a rather intuitive interpretation. First, on the E-step, we 
use the current iterate 5^ to produce a sample of /3’s from the posterior. 
Since this sample summarizes our current best understanding of (3, we then 
tune the prior by varying 5 on the M-step to match this sample as well as 
possible, and the value of 5 that matches the posterior sample the best will 
then become the next iterate m* +1 ). 

When p{j3\5) is given by the Aristotelian smoothness prior (16), the M- 
step of the MCEM algorithm is available in a closed form. Taking normal¬ 
ization into account, the prior density is given by 

(27) P (f3\5) = C(S) exp(—<5/3 T f2 A /3), 


where the normalization constant C(5) depends on the hyperparameter 5 
and satisfies C(5) =S P ^ 2 /f RP exp(—/3 t Oa/3) d/3. Hence, 

(28) log p(/3\6) = | log S - 5(3 t CIa/3 + const, 


where the constant does not depend on 5. Plugging this into equation (26), 
we find that the maximizer on the M-step is given by 


(29) 


<5(1+1) ____ 

(V(pS))ELAP {s) ) t ^a(3^' 


The resulting iteration for finding the MMLE S is summarized in Algo¬ 
rithm 1. The MCMC sampler is started from the empirical mean of the 
posterior sample of the previous iteration in order to facilitate the conver¬ 
gence of the Markov chain. In this work, we run the MCEM algorithm for 
a fixed number of steps 1Vem> but one could easily devise more elaborate 
stopping rules for the algorithm. Note, however, that the optimal choice of 
this stopping rule and the MCMC sample size S are, to a large extent, open 
problems [Booth and Hobert (1999)]. 
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Algorithm 1 MCEM algorithm for finding the MMLE 

Input: 

y—Observed data; 

A(°) > 0—Initial guess; 

-ZVem —Number of MCEM iterations; 

S —Size of the MCMC sample; 

Anit —Starting point for the MCMC sampler; 

Output: 

5 —MMLE of the hyperparameter <5; 

Set f3 Amt ’ 

for i = 0 to IVem — 1 do 

Sample ... ,(3 < ' S ' 1 p(/3\y,5^) starting from f3 using the 

single-component Metropolis-Hastings sampler of Saquib, Bouman and 
Sauer (1998); 

Set 

^(*+ 1 ) _ _ 1 _. 

(.V(pS))Es=i(P {s) ) T nAf3 {sy 

Compute /3 = ^ )Tf=i /3^; 

end for 

return 5 = S^ Ne m ). 


4.4. Uncertainty quantification. Frequentist uncertainty quantification 
in nonparametric inference is generally considered to be a very challenging 
problem; see, for example, Chapter 6 of Ruppert, Wand and Carroll (2003) 
for an overview of some of the issues involved. Common approaches consid¬ 
ered in the literature include bootstrapping [Davison and Hinkley (1997)] 
and various Bayesian constructions [see, e.g., Wahba (1983), Wood (2006) 
and Weir (1997)]. In the case of classical nonparametric regression, Bayesian 
intervals are often argued to have good frequentist properties based on the 
results of Nychka (1988), which guarantee that the coverage probability av¬ 
eraged over the design points is close to the nominal value. Such average 
coverage, however, provides no guarantees about pointwise coverage and 
the intervals can suffer from significant pointwise under- or overcoverage as 
demonstrated by Ruppert and Carroll (2000). 

The main problem in building confidence intervals for the unknown in¬ 
tensity / is the bias that is present in the estimator / in order to regularize 
the problem. We show using simulations in Section 5 that this bias is a ma¬ 
jor hurdle for both Bayesian and bootstrap confidence intervals, resulting 
in major frequentist undercoverage in regions of sizable bias. To overcome 
this issue, we propose attacking the problem from a different perspective: 






STATISTICAL UNFOLDING OF ELEMENTARY PARTICLE SPECTRA 19 


instead of directly using the variability of / to construct confidence inter¬ 
vals, we first iteratively reduce the bias of / and then use the variability 
of the bias-corrected estimator /bc to construct confidence intervals. This 
approach has similarities with the recent work of Javanmard and Montanari 
(2014) on uncertainty quantification in high-dimensional regression by de¬ 
biasing the estimator, but our problem setting and bias-correction method 
are different from theirs. 

It may at first seem counterintuitive that reducing the bias of / enables us 
to form improved confidence intervals—it is after all the bias that regular¬ 
izes the otherwise ill-posed problem. It is indeed true that the iterative bias- 
correction described below increases the variance of the point estimator /bc ; 
but at the same time the coverage performance of the intervals formed using 
/bc improves at each iteration of the procedure. What is more, our simu¬ 
lations reported in Section 5 indicate that, by stopping the bias-correction 
iteration early enough, it is possible to attain nearly nominal coverage with 
only a modest increase in interval length. In other words, one can use the 
iterative bias-correction to remove so much of the bias that the interval 
coverage probability is close to its nominal value, but the small amount 
of residual bias that remains in enough to regularize the interval length. 
This phenomenon is consistent with what Javanmard and Montanari (2014) 
observe in debiased ^-regularized regression. 

4.4.1. Iterative bias-correction. Our bias-correction approach is based on 
an iterative use of the bootstrap to estimate the bias of the point estimate 
(3. A similar approach has been previously used by Kuk (1995) and Gold¬ 
stein (1996) to debias estimators in generalized linear mixed models, but, 
to the best of our knowledge, this procedure has not been previously em¬ 
ployed to improve the frequentist coverage of confidence intervals in ill-posed 
nonparametric inverse problems, such as the one studied here. 

By definition, the bias of (3 is given by bias(/3) = E /3 0) — (3- The standard 
bootstrap estimate of this bias [see, e.g., Davison and Hinkley (1997)] re¬ 
places the actual value of (3 by the observed value of the estimator (3 which 
we denote by /3®. In other words, the bootstrap estimate of the bias is 

(30) bias® 0) = E^o) 0 ) ~ P {0) , 

where in practice the expectation is usually replaced by its empirical version 
obtained using simulations. The estimated bias can then be subtracted from 
/3® to obtain a bias-corrected estimator 

(31) /3® = /3® — bias® (/3). 

Now, ignoring the bootstrap sampling error, the reason /3® is not a perfectly 
unbiased estimator of (3 is the fact that in equation (30) the actual value of 
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/ 3 was replaced by an estimate (3^ . But since (3^ should be better than 
P^ at estimating (3 (at least in the sense that /T 1 ) should be less biased 
than /3(°)), we can replace / 3 in equation (30) by (3^ to obtain a new 
estimate of the bias 

(32) bias ( 1 ) (/3) = E^ W 0) -/3 (1) , 
which gives rise to a new bias-corrected estimator 

(33) fa® -&W0). 

The estimator (3^ can then be used to replace (3^ in equation (32) which 
naturally leads us to the following iterative bias-correction procedure: 

1 . Estimate the bias: biasW(/3) = E^) (P) — (3^. 

2 . Compute the bias-corrected estimate: /3^ +1) = (3^ — biasW(/3). 

The details of this procedure, when the expectation E^q) (j3) is replaced 
by its empirical version and the positivity constraint of P is enforced by 
setting any negative entries to zero, are given in Algorithm 2. Note that for 


Algorithm 2 Iterative bias-correction 

Input: 

/T°) -Observed value of the estimator /3; 

5 —Estimated value of the hyperparameter 6; 

Nbc —Number of bias-correction iterations; 

Rbc —Size of the bootstrap sample; 

S —Size of the MCMC sample; 

Output: 

$bc~ Bias-corrected point estimate; 

for i = 0 to Nbc — 1 do 

Sample y*W,y*( 2 ),... } y*( R nc) 'Ti 1, Poisson(K/3®); 

For each r = 1,... ,-Rbc> compute /3*( r ) = E(/3|y*( r \ <5) by sampling S 
observations from the posterior using the single-component Metropolis- 
Hastings sampler of Saquib, Bouman and Sauer (1998); 

Compute biasW(3) = 7 ^: — (3^; 

Set = /3(°) — biasW (3); 

Set = 1{/3'( 1+1 ' ) > 0 } o / 3 '( l+1 ), where o denotes the element-wise 

product; 

end for 

return /3 B c = P^ Nbc ^ ■ 
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computational reasons we keep the hyperparameter 5 fixed to its estimated 
value 5 instead of resampling it. 

4.4.2. Pointwise confidence bands from the bias-corrected estimator. The 
bias-corrected spline coefficients /3 BC are associated with a bias-corrected 
intensity function estimate /bc (s) = Yl P j= i Pbcj Bj(s). As the bias-corrected 
estimator /bc has a smaller bias than the original estimator /, the variability 
of /bc can be used to construct confidence intervals for / that have better 
coverage properties than the intervals based on the variability of /. The 
basic approach we follow is to use the bootstrap to probe the sampling 
distribution of /bc hi order to construct pointwise 4 confidence bands for /. 

To obtain the bootstrap sample, we generate R\jq i.i.d. observations from 
the Poisson(/l) distribution, where fi = y is the maximum likelihood esti¬ 
mator of fi = K (3. For each resampled observation y*( r ), r = 1,..., R\jq, we 
compute a resampled point estimate /3*h) = E(/3|y* ( ' r ^,5). Again, for com¬ 
putational reasons, the hyperparameter 5 is kept fixed to its estimated value 
5 for the observed datum y. The bias-correction procedure described in Al¬ 
gorithm 2 is then run with the resampled point estimate (3*^ to obtain a 
resampled bias-corrected point estimate /3 B( ^, leading to a resampled bias- 
corrected intensity function / B( f?. 

The sample = {/gc }f=i ! is a bootstrap representation of the sam¬ 
pling distribution of /bc and can be used to construct various forms of ap¬ 
proximate confidence intervals for / [Efron and Tibshirani (1993), Davison 
and Hinkley (1997)]. The simplest approach is to simply use the pointwise 
standard deviations of the bootstrap sample to form standard error intervals 
for /. However, it was concluded using simulations that the standard error 
intervals suffer from a slight reduction in coverage due to the skewness of 
the sampling distribution induced by the positivity constraint, especially in 
areas of low intensity values. To account for this effect, we use instead the 
bootstrap percentile intervals. More specifically, for every s € E, an approx¬ 
imate 1 — 2 a confidence interval for f(s ) is given by [/bc a( s )> /bc i- q ( s )L 
where /bc q ( s ) denotes the a-quantile of the bias-corrected bootstrap sam¬ 
ple -F BC evaluated at point s G E. The formal justification for these intervals 
comes from an implicit use of a transformation that (approximately) nor- 


4 For simplicity, we restrict our attention to 1 — 2 a pointwise frequentist confidence 
bands, that is, collections of random intervals [/(s; y), /(s; y)] that for any s £ E, a £ 
(0,0.5) and intensity function / aim to satisfy P/(/(s;y) < /(s) < /(s;y)) > 1 — 2a. This 
is in contrast with 1 — 2a simultaneous confidence bands, which would satisfy for any a G 
(0,0.5) and intensity function / the property P/(/(s;y) < /(s) < /(s;y), Vs £ E) > 1 — 2a; 
see also the discussion in Section 7. 
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malizes the sampling distribution of /bc( s )> see Section 13.3 of Efron and 
Tibshirani (1993). 

5. Simulation studies. 

5.1. Experiment setup. We first demonstrate the proposed unfolding 
methodology using simulated data. The data were generated using a two- 
component Gaussian mixture model on top of a uniform background and 
smeared by convolving the particle-level intensity with a Gaussian density. 
Specifically, the true process M had the intensity 

(34) /(s) = A to t|vriAA(s|-2,l) + 7r 2 AA(s|2 ,l)+7 r3|^|-|, s € E, 

where A to t = E(t) = f E f(s)ds > 0 is the expected number of true observa¬ 
tions, \E\ denotes the Lebesgue measure of E and the mixing proportions 7 r 
sum up to one and were set to 7Ti = 0.2, 7 t 2 = 0.5 and 7r3 = 0.3. We consider 
the sample sizes Atot = 1000, Atot = 10,000 and Atot = 20,000, which we refer 
to as the small, medium and large sample size experiments, respectively. The 
true space E and the smeared space F were both taken to be the interval 
[—7,7]. The true points X* were smeared with additive Gaussian noise of 
zero mean and unit variance. Points smeared beyond the boundaries of F 
were discarded from further analysis. With this setup, the smeared intensity 
is given by the convolution 

(35) g(t) = (Kf)(t) = [ J\f(t- s|0,l)/(s)ds, t G F. 

Je 

Note that this setup corresponds to the classically most difficult class of 
deconvolution problems since the Gaussian error has a supersmooth proba¬ 
bility density [Meister (2009)]. 

The smeared space F was discretized using n = 40 histogram bins of 
uniform size, while the true space E was discretized using order-4 B-splines 
with L = 26 uniformly placed interior knots, resulting in p = L + 4 = 30 
unknown basis coefficients. With these choices, the condition number of the 
smearing matrix K was cond(K) ~ 2.6 • 10 8 , indicating that the problem is 
severely ill-posed. The boundary hyperparameters were set to 7 l = 7 r = 5. 
All experiments reported in this paper were implemented in MATLAB and 
the computations were carried out on a desktop computer with a quad-core 
2.7 GHz Intel Core i5 processor. The outer bootstrap loop was parallelized 
to the four cores of this setup. 

With the exception of the number of bias-correction iterations, all the 
algorithmic parameters were fixed to the same values for the three different 
sample sizes. The MCEM algorithm was started using the initial hyperpa¬ 
rameter = 1 • 10 -5 and was run for 30 iterations. The MCMC sampler 
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was started from the nonnegative least-squares spline fit to the smeared 
data, that is, /3 init = ming> 0 ||K (3 — y|||, where the elements of K are given 
by equation (8) with the smearing kernel k(t,s ) = 5o(t — s ), where do de¬ 
notes the Dirac delta function. The sampler was then used to obtain 1000 
post-burn-in observations from the posterior. The number of bias-correction 
iterations was set to IVbc = 15 for the small sample size experiment, while 
IVbc = 5 iterations was used with the medium and the large sample size. 
In each case, Rbc = 10 bootstrap observations were used to obtain the bias 
estimates and the bias-corrected percentile intervals were obtained using 
R\jq = 200 bootstrap observations. 

5.2. Results. 

5.2.1. Unfolded intensities. For each sample size, 30 MCEM iterations 
sufficed for convergence to a stable point estimate of the hyperparameter 5 , 
with faster convergence for larger sample sizes. In each case, there was lit¬ 
tle Monte Carlo variation in the hyperparameter estimates. The estimated 
hyperparameters were 5 = 2.0 • 10~ 4 , 5 = 8.3 • 10 - ' and 5 = 2.8 • 10“' in the 
small, medium and large sample size cases, respectively. A figure illustrating 
the convergence of the MCEM iteration is given in the online supplement 
[Kuusela and Panaretos (2015), Figure 3(a)]. In each case, the running time 
to obtain 5, and thence the point estimate /, was approximately 8 minutes, 
while the time it took to form the confidence intervals was approximately 
14 hours for the medium and large sample sizes and 39 hours for the small 
sample size, where three times as many bias-correction iterations were per¬ 
formed. While these are fairly long computations on a desktop setup, it 
should be noted that it is easy to further parallelize the bootstrap compu¬ 
tations and the running time could be cut down substantially on a modern 
massively parallel cloud computing platform. 

Figure 2 shows the true intensities /, the unfolded intensities / and the 
bias-corrected unfolded intensities /bc f° r the three sample sizes. The confi¬ 
dence bands consist of 95% pointwise percentile intervals obtained by boot¬ 
strapping the bias-corrected point estimate as described in Section 4.4. In 
each case, the unfolded intensity beautifully captures the two-humped shape 
of the true intensity, and, unsurprisingly, the more data is available, the bet¬ 
ter the quality of the estimate. It is also apparent that the estimators are 
biased near the peaks and the trough of the true intensity, and the relative 
size of this bias is larger for smaller sample sizes. In each case, the bias can 
be reduced using the iterative bias-correction procedure, but at the cost of 
an increased variance of the point estimator. 

For each sample size, the confidence intervals perform well at covering the 
true intensity without being excessively long or wiggly. For these particular 
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(c) Gaussian mixture model data with Atot = 1000 


Fig. 2. Unfolding results for the Gaussian mixture model data with (a) Atot = 20,000, 
(b) Atot = 10,000 and (c) Atot = 1000. The confidence intervals are the 95% pointwise 
percentile intervals of the bias-corrected estimator. The number of bias-correction iterations 
was 5 in cases (a) and (b) and 15 in case (c). 
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realizations, the intervals cover the truth at every point s € E in the large 
and small sample size cases, while with the medium sample size the intervals 
cover everywhere except for a short section near s = —1.5 and another one 
near s = 0. 

5.2.2. Uncertainty quantification performance. We study the coverage 
performance of the bias-corrected intervals shown in Figure 2 using a Gaus¬ 
sian approximation to the Poisson likelihood described in Section 2.2 of the 
online supplement [Kuusela and Panaretos (2015)]. With this approxima¬ 
tion, finding of the point estimator (3 effectively reduces to a ridge regres¬ 
sion (i.e., Tikhonov regularization) problem for which the solution can be 
computed in a fraction of the time required for the posterior mean with the 
full Poisson likelihood. This enables us to perform an otherwise computa¬ 
tionally infeasible empirical coverage study for the bias-corrected intervals. 
The Gaussian approximated intervals look similar to the full intervals (see 
Section 3.2 of the supplement [Kuusela and Panaretos (2015)] for a detailed 
comparison), and we expect the coverage of the full intervals to be similar 
to or better than that of the Gaussian intervals. 

The coverage of the bias-corrected intervals is compared to alternative 
bootstrap and empirical Bayes intervals. The empirical Bayes intervals we 
consider are the 95% equal-tailed credible intervals induced by the empiri¬ 
cal Bayes posterior p(/3\y,5). The bootstrap intervals that we compare with 
are the standard percentile intervals without bias-correction, which are ob¬ 
tained from our procedure by setting the number of bias-correction itera¬ 
tions IVbc to zero. We also consider the bootstrap basic intervals which for 
point seE are given by [2/(s) - / 0 * 975 (s), 2/(s) - /o. 025 (s)], where f*(s) is 
the a-quantile of the bootstrap sample of nonbias-corrected unfolded inten¬ 
sities obtained using a fixed hyperparameter 5 and the resampling scheme 
y*(i), y *( 2 ), _ _ _ 5 y*CRuQ) 'Tf 1 - Poisson (K/3) with R\jq = 200. The bootstrap in¬ 
tervals are formed using the Gaussian approximation, while the empirical 
Bayes intervals are for the full Poisson likelihood. 

Figure 3(a) shows the empirical coverage of the bias-corrected intervals 
obtained using the Gaussian approximation and the various alternatives 
in the medium sample size case for 1000 repeated observations from the 
smeared process N. The coverage of the bias-corrected intervals is close 
to the nominal value of 95% throughout the spectrum (average coverage 
94.6%), although a minor reduction in coverage due to the residual bias can 
be observed near the peak at s = 2, where the lowest coverage is 91.7%. A 
very different behavior is observed with the empirical Bayes intervals: these 
intervals overcover for most parts of the spectrum, but at regions of signifi¬ 
cant bias they undercover. In particular, near s = 2, the worst-case coverage 
of these intervals is 74.5%. The standard bootstrap intervals, on the other 
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(a) Comparison of UQ performance, Atot = 10,000 



(b) Effect of bias-correct ion on UQ performance, Atot = 10,000 


Fig. 3. Coverage studies for the Gaussian mixture model data with expected sample size 
Atot = 10,000. Figure (a) compares the empirical coverage of the iteratively bias-corrected 
intervals with 5 bias-correction iterations to that of empirical Bayes credible intervals as 
well as the nonbias-corrected bootstrap percentile and basic intervals. Figure (b) shows the 
empirical coverage of the bias-corrected intervals as the number of bias-correction iterations 
is varied between 0 and 50. All intervals are formed for 95% nominal coverage shown by 
the horizontal line. 
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hand, consistently undercover (average coverage 81.6% for the percentile and 
79.2% for the basic intervals) and the nonbias-corrected percentile intervals 
in particular fail to cover near the peak at s = 2, with empirical coverage of 
just 33.1%. 

To gain further insight into the improvement in coverage provided by the 
bias-correction, we plot in Figure 3(b) the empirical coverage for various 
numbers of bias-correction iterations. For IVbc = 0, the coverage is simply 
that of the standard percentile intervals. We see that a single bias-correction 
iteration already improves the coverage significantly, with further iterations 
always improving the performance. In fact, increasing IVbc from 5, we are 
able to do away with the dip in Figure 3(a) around s = 2. However, we pre¬ 
ferred settling with IVbc = 5, as increasing the number of iterations produced 
increasingly wiggly intervals. Interestingly, the coverage of the bias-corrected 
intervals does not seem to suffer from the fact that all the bootstrap compu¬ 
tations were performed without resampling the hyperparameter estimate <5, 
which suggests that the uncertainty of 6 plays only a minor role in the overall 
uncertainty of /. 

The coverage patterns reported here for Atot = 10,000 repeat themselves 
in the small and large sample size situations, with starker differences be¬ 
tween the methods with the small sample size and milder differences with 
the large sample size. In the large sample size case, the bias-corrected in¬ 
tervals effectively attain nominal coverage (average coverage 94.7%), with 
no major deviations from the nominal due to residual bias. The bootstrap 
intervals, on the other hand, undercover, while the empirical Bayes inter¬ 
vals overcover, except for the peak at s = 2 where slight undercoverage is 
observed. In the most difficult small sample size case, both the bootstrap 
and the empirical Bayes intervals have major problems with undercoverage, 
while the bias-corrected intervals achieve close-to-nominal coverage on most 
parts of the spectrum except for some undercoverage at s = 2, where the 
worst-case coverage is 85.0%. See the supplement [Kuusela and Panaretos 
(2015)] for detailed coverage plots for the small and large sample size exper¬ 
iments as well as plots showing a realization of the intervals for the different 
methods and numbers of bias-correction iterations. The supplement also in¬ 
cludes a comparison with hierarchical Bayes intervals which are observed to 
provide qualitatively similar coverage performance as the empirical Bayes 
intervals, although with improved worst-case coverage for some choices of 
the hyperprior. 

6. Unfolding of the Z boson invariant mass spectrum. 

6.1. Description of the data. In this section we illustrate the proposed 
unfolding framework using real data from the CMS experiment at the Large 
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Hadron Collider. In particular, we unfold the Z boson invariant mass spec¬ 
trum published in Chatrchyan et al. (2013). The Z boson, which is produced 
in copious quantities at the LHC, is a mediator of the weak interaction. The 
particle is very short-lived and decays almost instantly into other elementary 
particles. The decay mode considered here is the decay of a Z boson into 
a positron and an electron, Z -A e + e _ . The original purpose of these data 
was to calibrate and measure the resolution of the CMS electromagnetic 
calorimeter, but they also serve as an excellent testbed for unfolding since 
the true mass spectrum of the Z boson is known with great precision from 
previous measurements. 

The electron and the positron produced in the decay of the Z boson are 
first detected in the CMS silicon tracker after which their energies Si, i = 1,2, 
are measured by stopping the particles at the ECAL; see Section 2.1. From 
this information, one can compute the invariant mass W of the electron- 
positron system defined by the equation 

(36) W 2 = (S i + T 2 ) 2 -|| Pl +p 2 ||i, 

where pi,i = 1,2, are the momenta of the two particles and the equation is 
written using the natural units where the speed of light c = 1. Since ||Pi ||§ = 
S 2 — m 2 , where m e is the rest mass of the electron, one can reconstruct the 
invariant mass W using only the ECAL energy deposits Si and the opening 
angle between the two tracks in the silicon tracker. 

The invariant mass W is preserved in particle decays. Furthermore, it is 
invariant under Lorentz transformations and has therefore the same value 
in every frame of reference. This means that the invariant mass of the Z bo¬ 
son, which is simply its rest mass m, is equal to the invariant mass of the 
electron-positron system, W = m. It follows that measurement of the in¬ 
variant mass spectrum of the electron-positron pair enables us to measure 
the mass spectrum of the Z boson itself. 

Due to the time-energy uncertainty principle, the Z boson does not have 
a unique rest mass m. Instead, the mass follows the Cauchy distribution, 
also known in particle physics as the Breit-Wigner distribution, with density 

pi - w) - 2n (m — m z ) 2 + r 2 /4’ 

where mz = 91.1876 GeV is the mode of the distribution (often simply 
called the mass of the Z boson) and T = 2.4952 GeV is the full width of the 
distribution at half maximum [Beringer et al. (2012)]. Since the contribution 
of background processes to the electron positron channel near the Z peak 
is negligible [Chatrchyan et al. (2013)], the underlying true intensity f(m) 
is proportional to p(m). 

The dominant source of smearing in measuring the Z boson invariant 
mass m is the measurement of the energy deposits Si in the ECAL. The 
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resolution of these energy deposits is in principle described by equation ( 1 ). 
However, when working on a small enough invariant mass interval around 
the Z peak, one can in practice assume that the response is given by a 
convolution with a fixed-width Gaussian kernel. Moreover, the left tail of 
the kernel is typically replaced with a more slowly decaying tail function in 
order to account for energy losses in the ECAL. It is therefore customary to 
model the response using the so-called Crystal Ball (CB) function [Oreglia 
(1980), Chatrchyan et al. (2013)] given by 

CB(m| Am, a 2 , a, 7 ) 

(38) 

where a, a > 0, 7 > 1 and C is a normalization constant chosen so that the 
function is a probability density. This function is a Gaussian density with 
mean Am and variance a 2 , where the left tail is replaced with a power- 
law function. The parameter a controls the location of the transition from 
exponential decay into power-law decay and the parameter 7 controls the 
decay rate of the power-law tail. The forward mapping K in equation (2) 
is then given by a convolution where the kernel is the CB function, that is, 
k(t, s ) = k(t — s) = CB (t — s\Am, a 2 , a, 7 ). 

The data set we use is a digitized version of the lower left-hand plot of 
Figure 11 in Chatrchyan et al. (2013). These data correspond to an inte¬ 
grated luminosity 5 of 4.98 fb -1 collected at the LHC in 2011 at the 7 TeV 
center-of-mass energy and include 67 778 electron-positron events with the 
measured invariant mass between 65 GeV and 115 GeV. The data are dis¬ 
cretized using a histogram with 100 bins of uniform width. For details on 
the event selection, see Chatrchyan et al. (2013) and the references therein. 

In order to estimate the parameters of the Crystal Ball function, we di¬ 
vided the data set into two independent samples by drawing a binomial 
random variable independently for each bin with the number of trials equal 
to the observed bin contents. Consequently, the bins of the resulting two 
histograms are marginally mutually independent and Poisson distributed. 
Each observed event had a 70% probability of belonging to the sample y 
used for the unfolding demonstration and a 30% probability of belonging to 
the sample used for CB parameter estimation. 


5 The number of particle reactions that took place in the accelerator is proportional to 
the integrated luminosity. As such, it is a measure of the amount of data produced by the 
accelerator. It is measured in units of inverse femtobarns, fb -1 . 
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The CB parameters (Am,a 2 ,a, r j) were estimated using maximum likeli¬ 
hood with the subsampled data on the full invariant mass range 65-115 GeV 
assuming that the true intensity is proportional to the Breit-Wigner distri¬ 
bution of equation (37). The unknown proportionality constant of the true 
intensity was also estimated as part of the maximum likelihood fit. The 
maximum likelihood estimates of the CB parameters were 

(39) (Am, a 2 , a, 7 ) = (0.56 GeV, (1.01 GeV) 2 ,1.95,1.40), 

indicating that the measured invariant mass is on average 0.56 GeV too high 
and has an experimental resolution of approximately 1 GeV. As a cross-check 
of the fit, the estimated Crystal Ball function was used to smear the true 
intensity to obtain the corresponding expected smeared histogram, which 
was found to be in good agreement with the observations. 

6.2. Unfolding setup and results. To carry out the unfolding of the Z 
boson invariant mass, we used the subsampled n = 30 bins on the inter¬ 
val F = [82.5 GeV,97.5 GeV]. The resulting histogram y had a total of 
42,475 electron- positron events. To account for events that are smeared 
into the observed interval F from the outside, we let the true space E = 
[81.5 GeV,98.5 GeV], that is, we extended it by approximately 1 a on both 
sides with respect to F. The true space E was discretized using order-4 B- 
splines with L = 34 uniformly placed interior knots, resulting in p = 38 un¬ 
known spline coefficients. It was found out that such overparameterization 
with p> n facilitated the mixing of the MCMC sampler. With these choices, 
the condition number of the smearing matrix was cond(K) ~ 8.1 • 10 3 . The 
boundary hyperparameters were set to 7 l = 7 r = 50. 

The parameters of the unfolding algorithm were set to the same values 
as in the medium and large sample size experiments of Section 5. In par¬ 
ticular, the number of bias-correction iterations was set to IVbc = 5. The 
MCEM algorithm converged after approximately 10 iterations to the hyper¬ 
parameter estimate 5 = 7.0 • 10 -8 with little Monte Carlo variation. Finding 
the hyperparameter 5 followed by the point estimate / took 10 minutes, 
while the running time to obtain the bias-corrected confidence intervals was 
approximately 17 hours. 

In Figure 4, the unfolded intensity / and the 95% pointwise percentile 
intervals of the bias-corrected intensity /bc are compared with the Breit- 
Wigner shape of the Z boson mass peak. The proportionality constant of the 
true intensity was obtained from the maximum likelihood fit described in 
Section 6.1. The figure also shows a histogram estimate of the smeared inten¬ 
sity given by the observed event counts y divided by the 0.5 GeV bin width. 
The unfolding algorithm is able to correctly reconstruct the location and 
width of the Z mass peak which are both distorted by the smearing in the 
ECAL. The intensity is also estimated reasonably well in the 1 GeV regions 
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Fig. 4. Unfolding of the Z boson invariant mass spectrum. The confidence band consists 
of the 95% pointwise percentile intervals of the bias-corrected estimator obtained using 
5 bias-correction iterations. The points show a histogram estimate of the smeared intensity. 
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in the tails of the intensity where no smeared observations were available. 
More importantly, the 95% bias-corrected confidence intervals cover the true 
mass peak across the whole invariant mass spectrum. We also observe that 
the bias-correction is particularly important near the top of the mass peak, 
where the original point estimate / appears to be somewhat biased. 

7. Concluding remarks. We have studied a novel approach to solving the 
high energy physics unfolding problem involving empirical Bayes selection of 
the regularization strength and frequentist uncertainty quantification using 
an iterative bias-correction. We have shown that empirical Bayes provides a 
straightforward, fully data-driven way of choosing the hyperparameter S and 
demonstrated that the method provides good estimates of the regularization 
strength with both simulated and real data. Given the good performance of 
the approach, we anticipate empirical Bayes methods to also be valuable in 
solving other inverse problems beyond the unfolding problem. 

The natural fully Bayesian alternative to empirical Bayes consists of using 
a Bayesian hierarchical model which necessitates the choice of a hyperprior 
for 5. Unfortunately, the resulting estimates are known to be sensitive to 
this nontrivial choice [Gelman (2006)]. We provide in the online supple¬ 
ment [Kuusela and Panaretos (2015)] a comparison of empirical Bayes and 
hierarchical Bayes for a number of uninformative hyperpriors. Our results 
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indicate that the relative point estimation performance of the methods de¬ 
pends strongly on the choice of the hyperprior, especially when there is only 
a limited amount of data. For example, in the small sample size experi¬ 
ment of Section 5, the integrated squared error of hierarchical Bayes ranges 
from 17% better to 30% worse than that of empirical Bayes depending on 
the choice of the hyperprior. Unfortunately, there are no guarantees that the 
uninformative hyperprior that worked the best in this example would always 
provide the best performance. These results also indicate that hyperpriors 
which were designed to be uninformative are actually fairly informative in 
hierarchical Bayes unfolding and have an undesirably large influence on the 
unfolded results, while empirical Bayes achieves comparable point estimation 
performance without the need to make any arbitrary distributional assump¬ 
tions on 5. The hierarchical Bayes credible intervals also suffer from similar 
coverage issues as the empirical Bayes credible intervals. 

The other main component of our approach is frequentist uncertainty 
quantification using an iterative bias-correction technique. We have shown 
that the bias-correction is crucial for establishing close-to-nominal frequen¬ 
tist coverage and that, at least within the context of our simulation study, 
the approach outperforms existing techniques based on bootstrap resam¬ 
pling and Bayesian credible intervals. The good performance of this approach 
raises many interesting questions for future work. For example, there seems 
to be a trade-off between the coverage and the length of the bias-corrected 
intervals and this trade-off is governed by the number of bias-correction it¬ 
erations -/Vgc- That is, it appears that one is able to obtain consistently 
better coverage by running more bias-correction iterations at the expense 
of increased interval length. By studying the theoretical properties of the 
iteration, one might be able to provide guidance on how to choose IVbc in 
such a way that it optimizes the length of the intervals while maintaining 
the coverage within a preset threshold of the nominal value. 

In this work, we have focused on the most basic inferential tool in non- 
parametric statistics, that is, we have aimed at building pointwise confidence 
bands for the unknown intensity / with good frequentist coverage prop¬ 
erties. These bands can be directly used for making pointwise inferential 
statements, such as testing if the data are consistent with some theoretical 
prediction of / at a single point s € E chosen before seeing the data. They 
also form the basis for making more complicated inferences. For example, 
one can envisage using the bands for testing whether the data are consistent 
with a theoretical prediction at several locations or over the whole spectrum 
by making a multiple testing correction. The appropriate form of the infer¬ 
ential statement depends, however, on the original purpose of the unfolding 
operation and each purpose listed in Section 2.2 involves different types of 
inferential goals, including the comparison and combination of several un¬ 
folded measurements. Extensions of the tools studied in this paper to such 
more complicated situations constitutes an important topic for future work. 
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It should also be noted that throughout this paper the confidence inter¬ 
vals are computed for a fixed forward mapping K. while in reality there 
is usually a systematic uncertainty associated with K stemming from the 
fact that K is typically estimated using either simulations or some auxil¬ 
iary measurements. In cases where the detector response can be modeled 
using a parametric model, such as the Z boson example of Section 6, the 
unknown parameters can be estimated using maximum likelihood on data 
generated by a known true intensity. More generally, however, one may need 
to consider nonparametric estimates of K. Once an estimate of K has been 
obtained, it is likely that its uncertainty can be incorporated into the outer 
bootstrap loop of the bias-corrected confidence intervals. A detailed study 
of these ideas will be the subject of future work by the authors. 

Finally, it should be pointed out that it is possible to find situations where 
the proposed unfolding methodology will not yield good reconstructions. 
This happens when the smoothness penalty, that is, penalizing for large val¬ 
ues of ||/"|||, is not the appropriate way of regularizing the problem. For 
instance, if the true intensity / contains sharp peaks or rapid oscillations, 
the solution would potentially be biased to an extent where the iterative 
bias-correction would be unable to adequately improve the situation. In par¬ 
ticular, the penalty would be unlikely to adapt well to situations where the 
magnitude of the second derivative varies significantly across the spectrum. 
This is the case, for example, when the intensity contains features at very 
different spatial scales or when it corresponds to a steeply falling spectrum 
that varies over many orders of magnitude. Naturally, in these cases, a more 
suitable choice of the family of regularizing priors {p(/3|<5)}<s>o should fix the 
problem. For example, in the case of a steeply falling spectrum, one could 
consider penalizing for the second derivative of log / instead of / or using 
a spatially adaptive smoothness penalty [Pintore, Speckman and Holmes 
(2006)]. These considerations highlight the fact that all the inferences con¬ 
sidered here are contingent on the chosen family of regularizing priors and 
should be interpreted with this in mind. 
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SUPPLEMENTARY MATERIAL 

Supplement to “Statistical unfolding of elementary particle spectra: Em¬ 
pirical Bayes estimation and bias-corrected uncertainty quantification” (DOI: 

10.1214/15-AOAS857SUPP; .pdf). The supplement provides a comparison 
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of the empirical Bayes and hierarchical Bayes approaches to unfolding; ad¬ 
ditional simulations results complementing those of Section 5; and technical 
material on the convergence and mixing of the MCMC sampler and on the 
Gaussian approximation used in the coverage studies. 
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